Genome Research — Latest Matching Preprints

1

Genome-wide analysis of mobile element insertions in human genomes

Niu, Y.; Teng, X.; Shi, Y.; Li, Y.; Tang, Y.; Zhang, P.; Luo, H.; Kang, Q.; The Han100K Initiative, ; Xu, T.; He, S.

2021-01-24 genomics 10.1101/2021.01.22.427873 medRxiv

Top 0.1%

64.3%

Show abstract

Mobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map of 36,699 non-reference MEIs constructed from 5,675 genomes, comprising 2,998 Chinese samples ([~]26.2X, NyuWa) and 2,677 samples from the 1000 Genomes Project ([~]7.4X, 1KGP). We discovered that LINE-1 insertions were highly enriched at centromere regions, implying the role of chromosome context in retroelement insertion. After functional annotation, we estimated that MEIs are responsible for about 9.3% of all protein-truncating events per genome. Finally, we built a companion database named HMEID for public use. This resource represents the latest and largest genomewide study on MEIs and will have broad utility for exploration of human MEI findings.

2

Inference of cell-type specific imprinted regulatory elements and genes during human neuronal differentiation

Liang, D.; Aygün, N.; Matoba, N.; Ideraabdullah, F.; Love, M. I.; Stein, J.

2021-10-05 genomics 10.1101/2021.10.04.463060 medRxiv

Top 0.1%

60.7%

Show abstract

Genomic imprinting results in gene expression biased by parental chromosome of origin and occurs in genes with important roles during human brain development. However, the cell-type and temporal specificity of imprinting during human neurogenesis is generally unknown. By detecting within-donor allelic biases in chromatin accessibility and gene expression that are unrelated to cross-donor genotype, we inferred imprinting in both primary human neural progenitor cells (phNPCs) and their differentiated neuronal progeny from up to 85 donors. We identified 43/20 putatively imprinted regulatory elements (IREs) in neurons/progenitors, and 133/79 putatively imprinted genes in neurons/progenitors. Though 10 IREs and 42 genes were shared between neurons and progenitors, most imprinting was only detected within specific cell types. In addition to well-known imprinted genes and their promoters, we inferred novel IREs and imprinted genes. We found IREs overlapped with CpG islands more than non-imprinted regulatory elements. Consistent with DNA methylation-based regulation of imprinted expression, some putatively imprinted regulatory elements also overlapped with differentially methylated regions on the maternal germline. Finally, we identified a progenitor-specific putatively imprinted gene overlap with copy number variation that is associated with uniparental disomy-like phenotypes. Our results can therefore be useful in interpreting the function of variants identified in future parent-of-origin association studies.

3

Decoding nucleosome positions with ATAC-seq data at single-cell level

Xu, B.; Li, X.; Gao, X.; Jia, Y.; Li, F.; Zhang, Z.

2021-02-08 bioinformatics 10.1101/2021.02.07.430096 medRxiv

Top 0.1%

59.9%

Show abstract

As the basal bricks, the dynamics and arrangement of nucleosomes orchestrate the higher architecture of chromatin in a fundamental way, thereby affecting almost all nuclear biology processes. Thanks to its rather simple protocol, ATAC-seq has been rapidly adopted as a major tool for chromatin-accessible profiling at both bulk and single-cell level. However, to picture the arrangement of nucleosomes per se remains a challenge with ATAC-seq. In the present work, we introduce a novel ATAC-seq analysis toolkit, named deNOPA, to predict nucleosome positions. Assessments showed that deNOPA not only outperformed state-of-the-art tools, but it is the only tool able to predict nucleosome position precisely with ultrasparse ATAC-seq data. The remarkable performance of deNOPA was fueled by the reads from short fragments, which compose nearly half of sequenced reads and are normally discarded from nucleosome position detection. However, we found that the short fragment reads enrich information on nucleosome positions and that the linker regions were predicted by reads from both short and long fragments using Gaussian smoothing. We applied deNOPA to a single-cell ATAC-seq dataset and deciphered the intrapopulation heterogeneity of the human erythroleukemic cell line (K562). Last, using deNOPA, we showed that the dynamics of nucleosome organization may not directly couple with chromatin accessibility in the cis-regulatory regions when human cells respond to heat shock stimulation. Our deNOPA provides a powerful tool with which to analyze the dynamics of chromatin at nucleosome position level in the single-cell ATAC-seq age.

4

Systematic Evaluation of Different R-loop Mapping Methods: Achieving Consensus, Resolving Discrepancies and Uncovering Distinct Types of RNA:DNA Hybrids

Chen, J.-Y.; Lim, D.-H.; Chen, L.; Zhou, Y.; Zhang, F.; Shao, C.; Zhang, X.; Li, H.; Wang, D.; Zhang, D.-E.; Fu, X.-D.

2022-02-19 genomics 10.1101/2022.02.18.480986 medRxiv

Top 0.1%

58.5%

Show abstract

R-loop, a three-stranded nucleic acid structure, has been recognized to play pivotal roles in critical physiological and pathological processes. Multiple technologies have been developed to profile R-loops genome-wide, but the existing data suffer from major discrepancies on determining genuine R-loop localization and its biological functions. Here, we experimentally and computationally evaluate eight representative R-loop mapping technologies, and reveal inherent biases and artifacts of individual technologies as key sources of discrepancies. Analyzing signals detected with different R-loop mapping strategies, we note that genuine R-loops predominately form at gene promoter regions, whereas most signals in gene body likely result from structured RNAs as part of repeat-containing transcripts. Interestingly, our analysis also uncovers two classes of R-loops: The first class consists of typical R-loops where the single-stranded DNA binding protein RPA binds both the template and non-template strands. By contrast, the second class appears independent of Pol II-mediated transcription and is characterized by RPA binding only in the template strand. These two different classes of RNA:DNA hybrids in the genome suggest distinct biochemical activities involved in their formation and regulation. In sum, our findings will guide future use of suitable technology for specific experimental purposes and the interpretation of R-loop functions.

5

Challenges in Detecting Somatic Recombination of Repeat Elements: Insights from Short and Long Read Datasets

Pascarella, G.; Frith, M. C.; Carninci, P.

2024-08-26 genomics 10.1101/2024.08.25.609631 medRxiv

Top 0.1%

55.8%

Show abstract

Non-allelic copies of the two major families of repeat elements in the human genome, Alu and L1, recombine somatically at high frequency. Tissue-specific recombination profiles are dynamic in cell differentiation and are altered in neurodegeneration, suggesting that somatic recombination of repeat elements can contribute to functional heterogeneity of cells in health and disease. The study of these genomic variants, however, presents several technical challenges related to their extremely low copy number and their sequence content. Here, we address key issues regarding detecting and annotating structural variants derived from recombining repeat elements in NGS data. We show that PCR introduces significant changes of recombination profiles in sequencing libraries and that recombination profiles are affected by the choice of sequencing platform. We refine previous estimates of recombination in single cells by analyzing recombination profiles in PCR-free HG002 datasets sequenced by Oxford Nanopore Technologies and PacBio sequencers while describing several platform-specific differences. We additionally provide evidence that recombination events annotated in state-of-the-art single-cell HG002 whole-genome sequencing datasets are likely molecular artifacts generated by PCR. By exploring the limits of current technologies, this work establishes essential requirements for future developments to enhance the reliability of detecting somatic recombination of repeat elements in genomic datasets.

6

Regulation potential of transcribed simple repeated sequences in developing neurons

Chung, T. H.; Zhuravskaya, A.; Makeyev, E. V.

2023-09-05 genomics 10.1101/2023.09.04.556210 medRxiv

Top 0.1%

55.4%

Show abstract

Simple repeated sequences (SRSs), defined as tandem iterations of microsatellite- to satellite-sized DNA units, occupy a substantial part of the human genome. Some of these elements are known to be transcribed in the context of repeat expansion disorders. Mounting evidence suggests that the transcription of SRSs may also contribute to normal cellular functions. Here, we used genome-wide bioinformatics approaches to systematically examine SRS transcriptional activity in cells undergoing neuronal differentiation. We identified thousands of long noncoding RNAs containing >200-nucleotide-long SRSs (SRS-lncRNAs), with hundreds of these transcripts significantly upregulated in the neural lineage. We show that SRS-lncRNAs often originate from telomere-proximal regions and that they have a strong potential to form multivalent contacts with a wide range of RNA-binding proteins. Our analyses also uncovered a cluster of neurally upregulated SRS-lncRNAs encoded in a centromere-proximal part of chromosome 9, which underwent an evolutionarily recent segmental duplication. Using a newly established in vitro system for rapid neuronal differentiation of induced pluripotent stem cells, we demonstrate that at least some of the bioinformatically predicted SRS-lncRNAs, including those encoded in the segmentally duplicated part of chromosome 9, indeed increase their expression in developing neurons to readily detectable levels. These data suggest that many SRSs may be expressed in a cell type and developmental stage-specific manner, providing a valuable resource for further studies focused on the functional consequences of SRS-lncRNAs in the normal development of the human brain.

7

Cryptic endogenous retrovirus subfamilies in the primate lineage

Chen, X.; Zhang, Z.; Yan, Y.; Goubert, C.; Bourque, G.; Inoue, F.

2023-12-08 genomics 10.1101/2023.12.07.570592 medRxiv

Top 0.1%

55.1%

Show abstract

Many endogenous retroviruses (ERVs) in the human genome are primate-specific and have contributed novel cis-regulatory elements and transcripts. However, current approaches for classifying and annotating ERVs and their long terminal repeats (LTRs) have limited resolution and are inaccurate. Here, we developed a new annotation based on phylogenetic analysis and cross-species conservation. Focusing on the evolutionary young MER11A/B/C subfamilies, we revealed the presence of 4 new subfamilies, that better explained the epigenetic heterogeneity observed within the MER11 instances, suggesting a new annotation for 412 (19.8%) of these repeat elements. Furthermore, we functionally validated the regulatory potential of these four new subfamilies using a massively parallel reporter assay (MPRA), which also identified motifs associated with their differential activities. Combining MPRA with new annotations across primates revealed an apes-specific gain of SOX related motifs through a single-nucleotide deletion. Lastly, by applying our approach across 53 simian-enriched LTR subfamilies, we defined a total of 75 new subfamilies and found that 3,807 (30.0%) instances from 26 LTR subfamilies could be categorized into a novel annotation, many of which with a distinct epigenetic profile. Thus, with our refined annotation of simian-enriched LTRs, it will be possible to better understand the evolution in primate genomes and potentially identify new roles for ERVs and their LTRs in the hosts.

8

Detecting regulatory elements in high-throughput reporter assays

Kim, Y.-S.; Jonhson, G. D.; Seo, J.; Barrera, A.; Majoros, W. H.; Ochoa, A.; Allen, A. S.; Reddy, T. E.

2020-08-07 bioinformatics 10.1101/2020.08.07.241901 medRxiv

Top 0.1%

54.5%

Show abstract

High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure genome-wide regulatory element activity across the human genome. The assays, however, also present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq signals. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local signal correlations.

9

Interparental Gene Conversion in General Population: A Novel Mechanism For Loss of Heterozygosity

Toratani, J.; Tachibana, M.; Sugawara, J.; Sugawara, A.; Sato, T.; Takahashi, Y.; Hiraga, H.; Yokoyama, E.; Watanabe, Z.; Saito, M.; Yaegashi, N.; Tamiya, G.; Takayama, J.

2024-12-07 genomics 10.1101/2024.12.05.626979 medRxiv

Top 0.1%

54.4%

Show abstract

Gene conversion is a process in which genetic material from a donor sequence is unidirectionally copied to an acceptor sequence during the homologous recombination repair of a DNA double-strand break. Although gene conversion has been widely studied in the context of meiosis, hereditary diseases, and cancer development, gene conversion between parental homologs in the zygotes remains controversial. Here, we developed a method to detect interparental gene conversions by focusing on Mendelian errors and identified gene conversion events in one out of every 21.8 births. Some of these events were observed in genetic regions, potentially affecting offspring phenotypes. Interparental gene conversion leads to the offspring inheriting two identical alleles from one parent, resulting in a loss of heterozygosity. Our findings suggest that naturally occurring interparental gene conversions may provide a novel mechanism for the development of certain genetic diseases.

10

A lineage-resolved multimodal single-cell atlas reveals the genomic dynamics of early C. elegans development

van der Burght, S. N.; Carelli, F. N.; Appert, A.; Dong, Y.; Hill, M.; Buttress, T.; Butler, R.; Ahringer, J.

2024-12-05 genomics 10.1101/2024.12.02.626321 medRxiv

Top 0.1%

54.3%

Show abstract

Multimodal single-cell profiling provides a powerful approach for unravelling the gene regulatory mechanisms that drive development, by simultaneously capturing cell-type- specific transcriptional and chromatin states. However, its inherently destructive nature hampers the ability to trace regulatory dynamics between mother and daughter cells. Taking advantage of the invariant cell lineage of Caenorhabditis elegans, we constructed a lineage- resolved single-cell multimodal map of pre-gastrulation development, which allows the tracing of chromatin and gene expression changes across cell divisions and regulatory cascades. We characterise the early dynamics of genome regulation, revealing that zygotic genome activation occurs on an accessible chromatin landscape pre-patterned both maternally and zygotically, and we identify a redundant family of transcriptional regulators that drive a transient pre-gastrulation program. Our findings demonstrate the power of a lineage-resolved atlas for dissecting the genome regulatory events of development.

11

The Collaborative Cross Graphical Genome

Su, H.; Chen, Z.; Rao, J.; Najarian, M.; Shorter, J. R.; Pardo-Manuel de Villena, F.; McMillan, L.

2019-11-29 bioinformatics 10.1101/858142 medRxiv

Top 0.1%

54.2%

Show abstract

The mouse reference is one of the most widely used and accurately assembled mammalian genomes, and is the foundation for a wide range of bioinformatics and genetics tools. However, it represents the genomic organization of a single inbred mouse strain. Recently, inexpensive and fast genome sequencing has enabled the assembly of other common mouse strains at a quality approaching that of the reference. However, using these alternative assemblies in standard genomics analysis pipelines presents significant challenges. It has been suggested that a pangenome reference assembly, which incorporates multiple genomes into a single representation, are the path forward, but there are few standards for, or instances of practical pangenome representations suitable for large eukaryotic genomes. We present a pragmatic graph-based pangenome representation as a genomic resource for the widely-used recombinant-inbred mouse genetic reference population known as the Collaborative Cross (CC) and its eight founder genomes. Our pangenome representation leverages existing standards for genomic sequence representations with backward-compatible extensions to describe graph topology and genome-specific annotations along paths. It packs 83 mouse genomes (8 founders + 75 CC strains) into a single graph representation that captures important notions relating genomes such as identity-by-descent and highly variable genomic regions. The introduction of special anchor nodes with sequence content provides a valid coordinate framework that divides large eukaryotic genomes into homologous segments and addresses most of the graph-based position reference issues. Parallel edges between anchors place variants within a context that facilitates orthogonal genome comparison and visualization. Furthermore, our graph structure allows annotations to be placed in multiple genomic contexts and simplifies their maintenance as the assembly improves. The CC reference pangenome provides an open framework for new tool chain development and analysis.

12

Transcriptional perturbation of LINE-1 elements reveals their cis-regulatory potential

Perez-Rico, Y. A.; Bousard, A.; Misikova, L. H.; Mulugeta, E.; de Almeida, S. F.; Muotri, A. R.; Heard, E.; Gendrel, A.-V.

2024-02-20 genomics 10.1101/2024.02.20.581275 medRxiv

Top 0.1%

53.5%

Show abstract

Long interspersed element-1 (LINE-1 or L1) retrotransposons constitute the largest transposable element (TE) family in mammalian genomes and contribute prominently to inter- and intra-individual genetic variation. Although most L1 elements are inactive, some evolutionary younger elements remain intact and genetically competent for transcription and occasionally retrotransposition. Despite being generally more abundant in gene-poor regions, intact or full-length L1s (FL-L1) are also enriched around specific classes of genes and on the eutherian X chromosome. How proximal FL-L1 may affect nearby gene expression remains unclear. In this study, we aim to examine this in a systematic manner using engineered mouse embryonic stem cells (ESCs) where the expression of one representative active L1 subfamily is specifically perturbed. We found that [~]1,024 genes are misregulated following FL-L1 activation and to a lesser extent ([~]81 genes), following their repression. In most cases (68%), misexpressed genes contain an intronic FL-L1 or lie near a FL-L1 (<260 kb). Gene ontology analysis shows that upon L1 activation, up-regulated genes are enriched for neuronal function-related terms, suggesting that some L1 elements may have evolved to control neuronal gene networks. These results illustrate the cis-regulatory impact of FL-L1 elements and suggest a broader role for L1s than originally anticipated.

13

Specialised super-enhancer networks in stem cells and neurons

Harabula, I.; Speakman, L.; Musella, F.; Fiorillo, L.; Zea-Redondo, L.; Kukalev, A.; Beagrie, R. A.; Morris, K. J.; Fernandes, L.; Irastorza-Azcarate, I.; Fernandes, A. M.; Carvalho, S.; Szabo, D.; Ferrai, C.; Nicodemi, M.; Welch, L.; Pombo, A.

2025-08-13 genomics 10.1101/2025.08.13.670083 medRxiv

Top 0.1%

52.5%

Show abstract

Super-enhancers (SEs) are clusters of enhancers with high transcriptional activity that play essential roles in defining cell identity through regulation of nearby genes. SEs preferentially form multiway chromatin interactions with other SEs and highly transcribed regions in embryonic stem cells. However, the properties of the interacting SEs and their specific contributions to complex regulatory interactions in differentiated cell types remain poorly understood. Here, we compare the structural and functional properties of SEs between embryonic stem cells (ESCs) and dopaminergic neurons (DNs) by combining Genome Architecture Mapping (GAM), chromatin accessibility, histone modification, and transcriptome data. Most SEs are cell-type specific and establish extensive pairwise and multiway chromatin interactions with other SEs and genes with cell-type specific expression. SE interactions span megabase genomic distances and frequently connect distant topologically associating domains. By applying network centrality analyses, we detected SEs with different hierarchical importance. Highest network centrality SEs contain binding motifs for cell-type specific transcription factors, and are candidate regulatory hubs. The functional heterogeneity of SEs is also highlighted by their organisation into modular sub-networks that differ in structure and spatial scale between ESCs and DNs, with more specific and strongly connected SE modules in post-mitotic neurons. Our results uncover both the high complexity and specificity of SE-based 3D regulatory networks and provide a resource for prioritizing SEs with potential roles in transcriptional regulation and disease.

14

Diverse Patterns of Allele-Specific Expression in Healthy Human Tissues

Mariner, B.; Sands, B.; Yun, S.; Jones, T.; Swisher, E.; McCormick, M.; Mendenhall, A. R.

2025-10-14 genomics 10.1101/2025.10.14.682127 medRxiv

Top 0.1%

52.4%

Show abstract

Differences in gene sequence and gene expression underlie variation in traits. However, even monozygotic twins do not express their genes in the same way, develop divergence in traits, and succumb to distinct chronic diseases. During development, epigenetic silencing programs cause diversity in allele expression, resulting in differences in traits and chronic disease risk. To quantify human autosomal allele expression between individuals, we analyzed human allele-specific expression data from the GTEx project. For hundreds of genes, some individuals will express the gene biallelically, while many others may only express one allele or extreme bias towards one allele. We found gene-specific patterns of interindividual variation in allele bias. We found that some individuals have more genome-wide monoallelic/biased expression than others. Individuals also had distinct combinations of allele expression bias. These differences can underlie variation in traits, idiopathic or incompletely penetrant traits/diseases, and chronic diseases. Significance/ImpactAllele-specific expression can affect cancer, immune response, and genetic disease. This work reveals 1) gene-specific multimodal patterns of interindividual variation in allele bias, 2) that individuals can maintain bias across tissues, and 3) that different individuals have distinct combinations of silenced alleles. These different patterns and the weighted classifications demonstrate how allele bias manifests between individuals; there are individuals and tissues with more biased/non-Mendelian expression and some tissues have more age-related changes in which alleles are silenced.

15

Genome mapping resolves structural variation within segmental duplications associated with microdeletion/microduplication syndromes

Mostovoy, Y.; Yilmaz, F.; Chow, S. K.; Chu, C.; Lin, C.; Geiger, E. A.; Meeks, N. J. L.; Chatfield, K. C.; Coughlin, C. R.; Kwok, P.-Y.; Shaikh, T. H.

2020-05-02 genomics 10.1101/2020.04.30.071449 medRxiv

Top 0.1%

51.6%

Show abstract

Segmental duplications (SDs) are a class of long, repetitive DNA elements whose paralogs share a high level of sequence similarity with each other. SDs mediate chromosomal rearrangements that lead to structural variation in the general population as well as genomic disorders associated with multiple congenital anomalies, including the 7q11.23 (Williams-Beuren Syndrome, WBS), 15q13.3, and 16p12.2 microdeletion syndromes. These three genomic regions, and the SDs within them, have been previously analyzed in a small number of individuals. However, population-level studies have been lacking because most techniques used for analyzing these complex regions are both labor- and cost-intensive. In this study, we present a high-throughput technique to genotype complex structural variation using a single molecule, long-range optical mapping approach. We identified novel structural variants (SVs) at 7q11.23, 15q13.3 and 16p12.2 using optical mapping data from 154 phenotypically normal individuals from 26 populations comprising 5 super-populations. We detected several novel SVs for each locus, some of which had significantly different prevalence between populations. Additionally, we refined the microdeletion breakpoints located within complex SDs in two patients with WBS, one patient with 15q13.3, and one patient with 16p12.2 microdeletion syndromes. The population-level data presented here highlights the extreme diversity of large and complex SVs within SD-containing regions. The approach we outline will greatly facilitate the investigation of the role of inter-SD structural variation as a driver of chromosomal rearrangements and genomic disorders.

16

Epigenetic control and inheritance of rDNA arrays

Potapova, T. A.; Kostos, P.; McKinney, S. A.; Borchers, M.; Haug, J. S.; Guarracino, A.; Solar, S.; Gogol, M. M.; Monfort Anez, G.; Gomes de Lima, L.; Wang, Y.; Hall, K. E.; Hoffman, S.; Garrison, E.; Phillippy, A. M.; Gerton, J. L.

2024-09-16 genomics 10.1101/2024.09.13.612795 medRxiv

Top 0.1%

51.4%

Show abstract

Ribosomal RNA (rRNA) genes exist in multiple copies arranged in tandem arrays known as ribosomal DNA (rDNA). The total number of gene copies is variable, and the mechanisms buffering this copy number variation remain unresolved. We surveyed the number, distribution, and activity of rDNA arrays at the level of individual chromosomes across multiple human and primate genomes. Each individual possessed a unique fingerprint of copy number distribution and activity of rDNA arrays. In some cases, entire rDNA arrays were transcriptionally silent. Silent rDNA arrays showed reduced association with the nucleolus and decreased interchromosomal interactions, indicating that the nucleolar organizer function of rDNA depends on transcriptional activity. Methyl-sequencing of flow-sorted chromosomes, combined with long read sequencing, showed epigenetic modification of rDNA promoter and coding region by DNA methylation. Silent arrays were in a closed chromatin state, as indicated by the accessibility profiles derived from Fiber-seq. Removing DNA methylation restored the transcriptional activity of silent arrays. Array activity status remained stable through the iPS cell re-programming. Family trio analysis demonstrated that the inactive rDNA haplotype can be traced to one of the parental genomes, suggesting that the epigenetic state of rDNA arrays may be heritable. We propose that the dosage of rRNA genes is epigenetically regulated by DNA methylation, and these methylation patterns specify nucleolar organizer function and can propagate transgenerationally.

17

Context-dependent 3D genome regulation by cohesin and related factors

Nakato, R.; Sakata, T.; Wang, J.; Nagai, L. A. E.; Oba, G. M.; Bando, M.; Shirahige, K.

2022-05-24 genomics 10.1101/2022.05.24.493188 medRxiv

Top 0.1%

51.1%

Show abstract

Cohesin plays vital roles in chromatin folding and gene expression regulation, cooperating with such factors as cohesin loaders, unloaders, acetyltransferase, and the insulation factor CTCF. Although various models of regulation have been proposed (e.g., loop extrusion), how cohesin and related factors collectively or individually regulate the hierarchical chromatin structure and gene expression remains unclear. In this study, we have depleted cohesin and related factors and then conducted a comprehensive evaluation of the resulting 3D genome, transcriptome and epigenome data. We observed substantial variation in depletion effects among factors at topologically associating domain (TAD) boundaries and on interTAD interactions, which were partly related to epigenomic status. Gene expression changes were highly correlated with direct cohesin binding and gain of TAD boundaries than with the loss of boundaries. Our results suggested that cohesin positively regulates gene expression, whereas other mechanisms (e.g., cohesin turnover and acetylation) add to the diversity of this pattern of dysregulation. Moreover, cohesin was broadly enriched in active compartment A, but not in compartment B, which were retained even after CTCF depletion. Our rich dataset and the subsequent data-driven analysis support the context-specific regulation of chromatin folding by cohesin and related factors.

18

Atlas of transcriptionally active transposable elements in human adult tissues

Bogu, G. K.; Reverter, F.; Marti-Renom, M. A.; Snyder, M.; Guigo, R.

2019-07-24 genomics 10.1101/714212 medRxiv

Top 0.1%

50.2%

Show abstract

Approximately half of the human genome consists of mobile repetitive DNA sequences known as transposable elements (TEs). They are usually silenced by epigenetic mechanisms, but a few are known to escape silencing at embryonic stages, affecting early human development by regulating nearby protein-coding genes. To investigate transcriptional activity in human adult tissues we systematically investigate the expression landscape of about 4.2 million non-coding TEs in 8,051 RNA-Seq datasets from up to 49 adult tissues and 540 individuals. We show that approximately 79,558 individual TEs (2%). belonging to 856 subfamilies escape epigenetic silencing in adult tissues and become transcriptionally active, often in a very tissue-specific manner. Supporting a role for TEs in the regulation of expression of nearby genes, we found the expression of TEs often correlated with the expression of nearby genes, and significantly stronger when the TEs include eQTLs for the genes. We identified thousands of tissue-elevated, sex-associated TEs in the breast, ethnicity-associated in the skin and age-associated in the tibial artery, where we found a potential implication of two TE subfamilies in atherosclerosis. Our results suggest a functional role of TEs in the regulation of gene expression, support their implication in human phenotypes, and also serve as a comprehensive resource of transcriptionally active TEs in human adult tissues.

19

Genome-wide identification of stable RNA-chromatin interactions

Wen, X.; Zhong, S.

2024-09-08 genomics 10.1101/2024.09.04.611281 medRxiv

Top 0.1%

50.1%

Show abstract

RNA-chromatin interactions play crucial roles in gene regulation and genome organization, but the interaction landscape remains poorly understood. In this study, we conducted an in-depth analysis of a previously published dataset on RNase-treated in situ mapping of the RNA-genome interactome in human embryonic stem cells. This dataset globally profiles RNase-insensitive RNA-chromatin interactions. Our analysis revealed that RNase treatment selectively preserved long-range RNA-chromatin interactions while removing promiscuous interactions resulting from the local diffusion of nascent transcripts. RNase-insensitive chromatin-associated RNAs (RI-caRNAs) exhibited high sequence conservation and preferentially localized to functional genomic regions, including promoters, transcription factor binding sites, and regions with specific histone modifications. Interestingly, coding and non-coding RNA transcripts showed distinct sensitivities to RNase, with lncRNAs and disease-associated transcripts being enriched among RI-caRNAs. Furthermore, we identified specific caRNA classes associated with individual transcription factors and histone modifications. Altogether, our findings reveal a RNase-inaccessible regulatory RNA-chromatin interactome and provide a resource for understanding RNA-mediated chromatin regulation.

20

Functional dissection of metabolic trait-associated gene regulation in steady state and stimulated human skeletal muscle cells

Nishino, K.; Kitzman, J. O.; Parker, S. C. J.; Tovar, A.

2024-12-03 genomics 10.1101/2024.11.28.625886 medRxiv

Top 0.1%

44.7%

Show abstract

Type 2 diabetes (T2D) is a common metabolic disorder characterized by dysregulation of glucose metabolism. Genome-wide association studies have defined hundreds of signals associated with T2D and related metabolic traits, predominantly in noncoding regions. While pancreatic islets have been a focal point given their central role in insulin production and glucose homeostasis, other metabolic tissues, including liver, adipose, and skeletal muscle, also contribute to T2D pathogenesis and risk. Here, we examined context-specific genetic regulation under basal and stimulated states. Using LHCN-M2 human skeletal muscle cells, we generated transcriptomic profiles and characterized regulatory activity of 327 metabolic trait-associated variants via a massively parallel reporter assay (MPRA). To identify condition-specific effects, we compared four different conditions: (1) undifferentiated, or (2) differentiated with basal media, (3) media supplemented with the AMP analog AICAR (to simulate exercise) or (4) media containing sodium palmitate (to induce insulin resistance). RNA-seq revealed these treatments extensively perturbed transcriptional regulation, with 498-3,686 genes showing significant differential expression between pairs of conditions. Among differentially expressed genes, we observed enrichment of relevant biological pathways including muscle differentiation (undifferentiated vs. differentiated), oxidoreductase activity (differentiated vs. AICAR), and glycogen binding (differentiated vs. palmitate). The results of our MPRA found broadly different levels of activity between all conditions. Our MPRA screen revealed a shared set of 7 variants with significant allelic activity across all conditions, along with a proportional number of variants showing condition-specific allelic bias and the total number of active oligos per condition. We found that a lead variant for serum triglyceride levels, rs490972, overlaps SP transcription factor motifs and has differential regulatory activity between conditions. Comparison of MPRA activity with paired gene expression data allowed us to predict that regulatory activity at this locus is mediated by SP1 transcription factor binding. While several of the MPRA variants have been previously characterized in other metabolic tissues, none have been studied in these stimulated states. Together, this work uncovers context-dependent transcriptomic and regulatory dynamics of T2D- and metabolic trait-associated variants in skeletal muscle cells, offering new insights into their functional roles in metabolic processes.